Senior SRE Reliability Engineer | Banking

  • Quarry Bay
  • Permanent
  • Wed Jun 10 04:55:22 2026
  • BBBH59741

Our Investment Bank client is looking for a Senior SRE role focused on monitoring, Kubernetes reliability, and observability to ensure resilient, scalable, high‑performing platforms to join their infrastructure team.


Key Responsibilities

  • Lead reliability and observability across platforms, ensuring high availability and performance
  • Design, implement, and enhance monitoring solutions using tools such as Prometheus, Grafana, and Elasticsearch
  • Develop alerting strategies, dashboards, and end-to-end observability pipelines
  • Diagnose complex production incidents through log analysis, troubleshooting, and root cause investigation
  • Manage and optimize Kubernetes environments, including health checks, scaling, and workload stability
  • Administer Linux systems (RHEL), covering upgrades, patching, and performance tuning
  • Collaborate with engineering, infrastructure, and application teams to strengthen system resilience and scalability
  • Maintain logging pipelines, including ingestion, parsing, and routing into search/analytics platforms
  • Continuously evaluate and adopt modern SRE tools, practices, and automation approaches
  • Participate in on-call rotations for production support, including off-hours coverage

Key Requirements

  • Degree in Computer Science, Engineering, or related field
  • 8–10 years’ experience in SRE, platform engineering, or production support environments
  • Strong hands-on expertise in monitoring and observability tools (e.g., Prometheus, Grafana, Elasticsearch, Kibana)
  • Proven experience building metrics pipelines, exporters, and integrations with long-term storage systems
  • Solid experience with automation and scripting (Python, Bash, Ansible, CI/CD pipelines)
  • Experience managing log processing pipelines (e.g., ingestion, filtering, enrichment)
  • Proficient in designing dashboards and analytics for distributed systems
  • Strong Linux administration knowledge, including troubleshooting and system optimization
  • Hands-on Kubernetes experience (operations, orchestration, scaling, and troubleshooting)
  • Understanding of SRE principles, incident management, high availability, and disaster recovery
  • Knowledge of networking concepts and distributed system performance tuning
  • Exposure to GPU-based or AI/ML infrastructure is advantageous
  • Self-driven, adaptable, and capable of handling multiple priorities in a fast-paced environment
  • Fluent in English; Cantonese and Mandarin language skills are a plus

“Sanderson-iKas” is the brand name for the following companies incorporated in Hong Kong: Sanderson Solutions International (Hong Kong) Limited (Business Registration no.53741924) and iKas International (Asia) Limited (Business Registration no.39818987)

Website: www.sanderson-ikas.hk